{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "name": "glue_finetuning_and_submission.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kaa2BdMZz9Ua"
      },
      "source": [
        "# Fine-tuning Transformer models and test prediction for GLUE tasks, using *torchdistill*"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v9cTkGsg0I6K"
      },
      "source": [
        "## 1. Make sure you have access to GPU/TPU\n",
        "Google Colab: Runtime -> Change runtime type -> Hardware accelarator: \"GPU\" or \"TPU\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-7U6G5N8z06c",
        "outputId": "896204d4-ed18-4778-c5ee-0fad9ef26a19"
      },
      "source": [
        "!nvidia-smi"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Thu May 20 22:12:05 2021       \n",
            "+-----------------------------------------------------------------------------+\n",
            "| NVIDIA-SMI 465.19.01    Driver Version: 460.32.03    CUDA Version: 11.2     |\n",
            "|-------------------------------+----------------------+----------------------+\n",
            "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
            "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
            "|                               |                      |               MIG M. |\n",
            "|===============================+======================+======================|\n",
            "|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |\n",
            "| N/A   60C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |\n",
            "|                               |                      |                  N/A |\n",
            "+-------------------------------+----------------------+----------------------+\n",
            "                                                                               \n",
            "+-----------------------------------------------------------------------------+\n",
            "| Processes:                                                                  |\n",
            "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\n",
            "|        ID   ID                                                   Usage      |\n",
            "|=============================================================================|\n",
            "|  No running processes found                                                 |\n",
            "+-----------------------------------------------------------------------------+\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WtaTzdTy0mMg"
      },
      "source": [
        "## 2. Clone torchdistill repository to use its example code and configuration files"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wR5GGkREVl3s",
        "outputId": "e8b62326-874c-4edf-c8e1-e54663df0110"
      },
      "source": [
        "!git clone https://github.com/yoshitomo-matsubara/torchdistill"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Cloning into 'torchdistill'...\n",
            "remote: Enumerating objects: 4979, done.\u001b[K\n",
            "remote: Counting objects: 100% (761/761), done.\u001b[K\n",
            "remote: Compressing objects: 100% (450/450), done.\u001b[K\n",
            "remote: Total 4979 (delta 436), reused 531 (delta 260), pack-reused 4218\u001b[K\n",
            "Receiving objects: 100% (4979/4979), 1.06 MiB | 6.51 MiB/s, done.\n",
            "Resolving deltas: 100% (3050/3050), done.\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZgzJAnV00UN8"
      },
      "source": [
        "## 3. Install dependencies and *torchdistill*"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Gz9Y_IpzevFw",
        "outputId": "899d2281-b1b2-46e0-83db-9cde0e4038a3"
      },
      "source": [
        "!pip install -r torchdistill/examples/hf_transformers/requirements.txt\n",
        "!pip install torchdistill"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Collecting accelerate\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/f7/fa/d173d923c953d930702066894abf128a7e5258c6f64cf088d2c5a83f46a3/accelerate-0.3.0-py3-none-any.whl (49kB)\n",
            "\u001b[K     |████████████████████████████████| 51kB 4.3MB/s \n",
            "\u001b[?25hCollecting datasets>=1.1.3\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/46/1a/b9f9b3bfef624686ae81c070f0a6bb635047b17cdb3698c7ad01281e6f9a/datasets-1.6.2-py3-none-any.whl (221kB)\n",
            "\u001b[K     |████████████████████████████████| 225kB 12.5MB/s \n",
            "\u001b[?25hCollecting sentencepiece!=0.1.92\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)\n",
            "\u001b[K     |████████████████████████████████| 1.2MB 44.3MB/s \n",
            "\u001b[?25hRequirement already satisfied: protobuf in /usr/local/lib/python3.7/dist-packages (from -r torchdistill/examples/hf_transformers/requirements.txt (line 4)) (3.12.4)\n",
            "Requirement already satisfied: torch>=1.8.1 in /usr/local/lib/python3.7/dist-packages (from -r torchdistill/examples/hf_transformers/requirements.txt (line 5)) (1.8.1+cu101)\n",
            "Collecting transformers>=4.6.1\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/d5/43/cfe4ee779bbd6a678ac6a97c5a5cdeb03c35f9eaebbb9720b036680f9a2d/transformers-4.6.1-py3-none-any.whl (2.2MB)\n",
            "\u001b[K     |████████████████████████████████| 2.3MB 44.9MB/s \n",
            "\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from -r torchdistill/examples/hf_transformers/requirements.txt (line 7)) (1.1.5)\n",
            "Collecting pyaml>=20.4.0\n",
            "  Downloading https://files.pythonhosted.org/packages/15/c4/1310a054d33abc318426a956e7d6df0df76a6ddfa9c66f6310274fb75d42/pyaml-20.4.0-py2.py3-none-any.whl\n",
            "Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (0.3.3)\n",
            "Collecting huggingface-hub<0.1.0\n",
            "  Downloading https://files.pythonhosted.org/packages/32/a1/7c5261396da23ec364e296a4fb8a1cd6a5a2ff457215c6447038f18c0309/huggingface_hub-0.0.9-py3-none-any.whl\n",
            "Requirement already satisfied: pyarrow>=1.0.0<4.0.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (3.0.0)\n",
            "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (4.0.1)\n",
            "Collecting fsspec\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/bc/52/816d1a3a599176057bf29dfacb1f8fadb61d35fbd96cb1bab4aaa7df83c0/fsspec-2021.5.0-py3-none-any.whl (111kB)\n",
            "\u001b[K     |████████████████████████████████| 112kB 52.5MB/s \n",
            "\u001b[?25hCollecting xxhash\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/7d/4f/0a862cad26aa2ed7a7cd87178cbbfa824fc1383e472d63596a0d018374e7/xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243kB)\n",
            "\u001b[K     |████████████████████████████████| 245kB 49.5MB/s \n",
            "\u001b[?25hRequirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (0.70.11.1)\n",
            "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2.23.0)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (1.19.5)\n",
            "Requirement already satisfied: tqdm<4.50.0,>=4.27 in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (4.41.1)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (20.9)\n",
            "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from protobuf->-r torchdistill/examples/hf_transformers/requirements.txt (line 4)) (56.1.0)\n",
            "Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.7/dist-packages (from protobuf->-r torchdistill/examples/hf_transformers/requirements.txt (line 4)) (1.15.0)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.8.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 5)) (3.7.4.3)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (3.0.12)\n",
            "Collecting tokenizers<0.11,>=0.10.1\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)\n",
            "\u001b[K     |████████████████████████████████| 3.3MB 49.3MB/s \n",
            "\u001b[?25hCollecting sacremoses\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
            "\u001b[K     |████████████████████████████████| 901kB 41.6MB/s \n",
            "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (2019.12.20)\n",
            "Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->-r torchdistill/examples/hf_transformers/requirements.txt (line 7)) (2.8.1)\n",
            "Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->-r torchdistill/examples/hf_transformers/requirements.txt (line 7)) (2018.9)\n",
            "Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from pyaml>=20.4.0->accelerate->-r torchdistill/examples/hf_transformers/requirements.txt (line 1)) (3.13)\n",
            "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (3.4.1)\n",
            "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2.10)\n",
            "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (1.24.3)\n",
            "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (3.0.4)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2020.12.5)\n",
            "Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->datasets>=1.1.3->-r torchdistill/examples/hf_transformers/requirements.txt (line 2)) (2.4.7)\n",
            "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (1.0.1)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers>=4.6.1->-r torchdistill/examples/hf_transformers/requirements.txt (line 6)) (8.0.0)\n",
            "\u001b[31mERROR: transformers 4.6.1 has requirement huggingface-hub==0.0.8, but you'll have huggingface-hub 0.0.9 which is incompatible.\u001b[0m\n",
            "Installing collected packages: pyaml, accelerate, huggingface-hub, fsspec, xxhash, datasets, sentencepiece, tokenizers, sacremoses, transformers\n",
            "Successfully installed accelerate-0.3.0 datasets-1.6.2 fsspec-2021.5.0 huggingface-hub-0.0.9 pyaml-20.4.0 sacremoses-0.0.45 sentencepiece-0.1.95 tokenizers-0.10.2 transformers-4.6.1 xxhash-2.0.2\n",
            "Collecting torchdistill\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/1d/1e/98c4591040d5ba7b849432e4bc6a575a8c87aa228fa043cbfb1ead9695be/torchdistill-0.2.0-py3-none-any.whl (78kB)\n",
            "\u001b[K     |████████████████████████████████| 81kB 5.7MB/s \n",
            "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torchdistill) (1.19.5)\n",
            "Requirement already satisfied: pycocotools>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from torchdistill) (2.0.2)\n",
            "Requirement already satisfied: torchvision>=0.8.2 in /usr/local/lib/python3.7/dist-packages (from torchdistill) (0.9.1+cu101)\n",
            "Collecting pyyaml>=5.4.1\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)\n",
            "\u001b[K     |████████████████████████████████| 645kB 10.3MB/s \n",
            "\u001b[?25hRequirement already satisfied: torch>=1.7.1 in /usr/local/lib/python3.7/dist-packages (from torchdistill) (1.8.1+cu101)\n",
            "Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from torchdistill) (0.29.23)\n",
            "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from torchdistill) (1.4.1)\n",
            "Requirement already satisfied: matplotlib>=2.1.0 in /usr/local/lib/python3.7/dist-packages (from pycocotools>=2.0.1->torchdistill) (3.2.2)\n",
            "Requirement already satisfied: setuptools>=18.0 in /usr/local/lib/python3.7/dist-packages (from pycocotools>=2.0.1->torchdistill) (56.1.0)\n",
            "Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision>=0.8.2->torchdistill) (7.1.2)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.7.1->torchdistill) (3.7.4.3)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (1.3.1)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (0.10.0)\n",
            "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (2.4.7)\n",
            "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (2.8.1)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib>=2.1.0->pycocotools>=2.0.1->torchdistill) (1.15.0)\n",
            "Installing collected packages: pyyaml, torchdistill\n",
            "  Found existing installation: PyYAML 3.13\n",
            "    Uninstalling PyYAML-3.13:\n",
            "      Successfully uninstalled PyYAML-3.13\n",
            "Successfully installed pyyaml-5.4.1 torchdistill-0.2.0\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GgvtpJmHSGXr"
      },
      "source": [
        "## (Optional) Configure Accelerate for 2x-speedup training by mixed-precision\n",
        "\n",
        "If you are **NOT** using the Google Colab Pro, it will exceed 12 hours (maximum lifetimes for free Google Colab users) to fine-tune a base-sized model for the following 9 different tasks with Tesla K80.\n",
        "By using mixed-precision training, you can complete all the 9 fine-tuning jobs.\n",
        "[This table](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification#mixed-precision-training) gives you a good idea about how long it will take to fine-tune a BERT-Base on a Titan RTX with/without mixed-precision."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "MGI5L9W6SEfT",
        "outputId": "1ea7617b-ed2d-4311-b68a-a80806525c92"
      },
      "source": [
        "!accelerate config"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "In which compute environment are you running? ([0] This machine, [1] AWS (Amazon SageMaker)): 0\n",
            "Which type of machine are you using? ([0] No distributed training, [1] multi-GPU, [2] TPU): 0\n",
            "How many processes in total will you use? [1]: 1\n",
            "Do you wish to use FP16 (mixed precision)? [yes/NO]: yes\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RxppgvR51Ij1"
      },
      "source": [
        "## 4. Fine-tune Transformer models for GLUE tasks\n",
        "The following examples demonstrate how to fine-tune pretrained BERT-Base (uncased) on each of datasets in GLUE.  \n",
        "**Note**: Test splits for GLUE tasks in `datasets` package are not labeled, and you use only training and validation spltis in this example, following [Hugging Face's example](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bFHCWbIG1paE"
      },
      "source": [
        "### 4.1 CoLA task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4oFHTV4jV7yN",
        "outputId": "42c69cbb-173e-486b-b454-b8c20eb8c225"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/cola/ce/bert_base_uncased.yaml \\\n",
        "  --task cola \\\n",
        "  --log log/glue/cola/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 18:17:08.944023: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 18:17:11\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/ce/bert_base_uncased.yaml', log='log/glue/cola/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)\n",
            "2021/05/20 18:17:11\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654060689104 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp4cjqmw52\n",
            "Downloading: 100% 570/570 [00:00<00:00, 574kB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "creating metadata file for /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654060689104 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"cola\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654051602640 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpu6e9dbdf\n",
            "Downloading: 100% 232k/232k [00:00<00:00, 17.2MB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "creating metadata file for /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654051602640 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654019186320 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp6y9a3h_r\n",
            "Downloading: 100% 466k/466k [00:00<00:00, 18.8MB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "creating metadata file for /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654019186320 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654019242448 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpaysf9gja\n",
            "Downloading: 100% 28.0/28.0 [00:00<00:00, 44.1kB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "creating metadata file for /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654019242448 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "2021/05/20 18:17:11\tINFO\tfilelock\tLock 139654056957968 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpe7uqg_ef\n",
            "Downloading: 100% 440M/440M [00:07<00:00, 58.7MB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "creating metadata file for /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "2021/05/20 18:17:19\tINFO\tfilelock\tLock 139654056957968 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading: 28.8kB [00:00, 23.1MB/s]       \n",
            "Downloading: 28.7kB [00:00, 29.9MB/s]       \n",
            "Downloading and preparing dataset glue/cola (download: 368.14 KiB, generated: 596.73 KiB, post-processed: Unknown size, total: 964.86 KiB) to /root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 377k/377k [00:00<00:00, 1.06MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 9/9 [00:00<00:00, 27.04ba/s]\n",
            "100% 2/2 [00:00<00:00, 56.60ba/s]\n",
            "100% 2/2 [00:00<00:00, 58.15ba/s]\n",
            "Downloading: 5.75kB [00:00, 5.81MB/s]       \n",
            "2021/05/20 18:17:23\tINFO\t__main__\tStart training\n",
            "2021/05/20 18:17:23\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 18:17:23\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 18:17:23\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/20 18:17:34\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [  0/268]  eta: 0:01:22  lr: 1.9975124378109453e-05  sample/s: 13.791053069840505  loss: 0.5859 (0.5859)  time: 0.3079  data: 0.0179  max mem: 1854\n",
            "2021/05/20 18:17:39\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 50/268]  eta: 0:00:22  lr: 1.873134328358209e-05  sample/s: 40.23583358115173  loss: 0.5961 (0.6075)  time: 0.1012  data: 0.0030  max mem: 2528\n",
            "2021/05/20 18:17:44\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [100/268]  eta: 0:00:17  lr: 1.7487562189054726e-05  sample/s: 41.58355020844797  loss: 0.5694 (0.5928)  time: 0.1027  data: 0.0029  max mem: 2528\n",
            "2021/05/20 18:17:49\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [150/268]  eta: 0:00:12  lr: 1.6243781094527366e-05  sample/s: 43.61604234421836  loss: 0.5189 (0.5766)  time: 0.1008  data: 0.0030  max mem: 2528\n",
            "2021/05/20 18:17:55\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [200/268]  eta: 0:00:07  lr: 1.5000000000000002e-05  sample/s: 37.373061980967314  loss: 0.4837 (0.5575)  time: 0.1044  data: 0.0030  max mem: 2528\n",
            "2021/05/20 18:18:00\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [250/268]  eta: 0:00:01  lr: 1.3756218905472638e-05  sample/s: 41.21598993750246  loss: 0.4830 (0.5410)  time: 0.1022  data: 0.0029  max mem: 2528\n",
            "2021/05/20 18:18:02\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:27\n",
            "2021/05/20 18:18:02\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
            "2021/05/20 18:18:02\tINFO\t__main__\tValidation: matthews_correlation = 0.5012202588868276\n",
            "2021/05/20 18:18:02\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/cola/ce/cola-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/cola/ce/cola-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:18:04\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [  0/268]  eta: 0:00:28  lr: 1.3308457711442788e-05  sample/s: 38.51589560873478  loss: 0.4412 (0.4412)  time: 0.1081  data: 0.0042  max mem: 2676\n",
            "2021/05/20 18:18:09\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 50/268]  eta: 0:00:22  lr: 1.2064676616915423e-05  sample/s: 41.419804321903555  loss: 0.4181 (0.4351)  time: 0.1038  data: 0.0031  max mem: 2676\n",
            "2021/05/20 18:18:14\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [100/268]  eta: 0:00:17  lr: 1.082089552238806e-05  sample/s: 36.973276975357344  loss: 0.4477 (0.4214)  time: 0.1048  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:19\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [150/268]  eta: 0:00:12  lr: 9.577114427860697e-06  sample/s: 41.486994216079744  loss: 0.4110 (0.4140)  time: 0.1031  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:24\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [200/268]  eta: 0:00:07  lr: 8.333333333333334e-06  sample/s: 40.86950039828797  loss: 0.3957 (0.4058)  time: 0.1039  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:30\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [250/268]  eta: 0:00:01  lr: 7.089552238805971e-06  sample/s: 36.91876523866947  loss: 0.3850 (0.4018)  time: 0.1065  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:31\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:27\n",
            "2021/05/20 18:18:32\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
            "2021/05/20 18:18:32\tINFO\t__main__\tValidation: matthews_correlation = 0.537773817191238\n",
            "2021/05/20 18:18:32\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/cola/ce/cola-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/cola/ce/cola-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:18:33\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [  0/268]  eta: 0:00:27  lr: 6.64179104477612e-06  sample/s: 40.27069856532833  loss: 0.2261 (0.2261)  time: 0.1034  data: 0.0041  max mem: 2676\n",
            "2021/05/20 18:18:38\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 50/268]  eta: 0:00:22  lr: 5.398009950248757e-06  sample/s: 40.69374211700786  loss: 0.3279 (0.3132)  time: 0.1031  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:44\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [100/268]  eta: 0:00:17  lr: 4.1542288557213935e-06  sample/s: 41.13675951353472  loss: 0.2553 (0.3008)  time: 0.1066  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:49\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [150/268]  eta: 0:00:12  lr: 2.9104477611940303e-06  sample/s: 40.5639665473079  loss: 0.2848 (0.2969)  time: 0.1046  data: 0.0029  max mem: 2676\n",
            "2021/05/20 18:18:54\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [200/268]  eta: 0:00:07  lr: 1.6666666666666667e-06  sample/s: 40.78029192644718  loss: 0.2972 (0.3016)  time: 0.1060  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:18:59\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [250/268]  eta: 0:00:01  lr: 4.2288557213930354e-07  sample/s: 36.69091847490913  loss: 0.2643 (0.3017)  time: 0.1051  data: 0.0030  max mem: 2676\n",
            "2021/05/20 18:19:01\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:28\n",
            "2021/05/20 18:19:02\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
            "2021/05/20 18:19:02\tINFO\t__main__\tValidation: matthews_correlation = 0.530657868197394\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"cola\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/cola/ce/cola-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/cola/ce/cola-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/20 18:19:03\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/20 18:19:04\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow\n",
            "2021/05/20 18:19:04\tINFO\t__main__\tTest: matthews_correlation = 0.537773817191238\n",
            "2021/05/20 18:19:04\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/20 18:19:04\tINFO\t__main__\tcola/test: 1063 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6MzjOFPY1w1r"
      },
      "source": [
        "### 4.2 SST-2 task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "acMDi9f3pd50",
        "outputId": "d2585f69-9f9c-4327-d374-e7cade4dfb03"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/sst2/ce/bert_base_uncased.yaml \\\n",
        "  --task sst2 \\\n",
        "  --log log/glue/sst2/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 18:19:08.498593: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 18:19:10\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/sst2/ce/bert_base_uncased.yaml', log='log/glue/sst2/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='sst2', test_only=False, world_size=1)\n",
            "2021/05/20 18:19:10\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"sst2\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/sst2 (download: 7.09 MiB, generated: 4.81 MiB, post-processed: Unknown size, total: 11.90 MiB) to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 7.44M/7.44M [00:00<00:00, 9.98MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 68/68 [00:03<00:00, 22.43ba/s]\n",
            "100% 1/1 [00:00<00:00,  6.09ba/s]\n",
            "100% 2/2 [00:00<00:00, 16.37ba/s]\n",
            "2021/05/20 18:19:19\tINFO\t__main__\tStart training\n",
            "2021/05/20 18:19:19\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 18:19:19\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 18:19:19\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/20 18:19:22\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [   0/2105]  eta: 0:06:18  lr: 1.9996832937450518e-05  sample/s: 23.16939806189935  loss: 0.6759 (0.6759)  time: 0.1797  data: 0.0070  max mem: 1854\n",
            "2021/05/20 18:20:26\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 500/2105]  eta: 0:03:25  lr: 1.8413301662707842e-05  sample/s: 33.66539313577552  loss: 0.2753 (0.3411)  time: 0.1201  data: 0.0032  max mem: 3003\n",
            "2021/05/20 18:21:30\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1000/2105]  eta: 0:02:21  lr: 1.6829770387965163e-05  sample/s: 30.618821837507163  loss: 0.1658 (0.2786)  time: 0.1255  data: 0.0031  max mem: 3003\n",
            "2021/05/20 18:22:35\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1500/2105]  eta: 0:01:17  lr: 1.5246239113222487e-05  sample/s: 33.56362843419097  loss: 0.1224 (0.2492)  time: 0.1267  data: 0.0032  max mem: 3148\n",
            "2021/05/20 18:23:39\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [2000/2105]  eta: 0:00:13  lr: 1.3662707838479811e-05  sample/s: 33.56537431202572  loss: 0.1193 (0.2318)  time: 0.1301  data: 0.0034  max mem: 3151\n",
            "2021/05/20 18:23:53\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:04:30\n",
            "2021/05/20 18:23:54\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
            "2021/05/20 18:23:54\tINFO\t__main__\tValidation: accuracy = 0.911697247706422\n",
            "2021/05/20 18:23:54\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/sst2/ce/sst2-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/sst2/ce/sst2-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:23:55\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [   0/2105]  eta: 0:04:23  lr: 1.3330166270783848e-05  sample/s: 33.88316648120061  loss: 0.0867 (0.0867)  time: 0.1254  data: 0.0073  max mem: 3151\n",
            "2021/05/20 18:24:59\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 500/2105]  eta: 0:03:25  lr: 1.1746634996041172e-05  sample/s: 33.62962784786783  loss: 0.1411 (0.1302)  time: 0.1323  data: 0.0033  max mem: 3151\n",
            "2021/05/20 18:26:03\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1000/2105]  eta: 0:02:21  lr: 1.0163103721298497e-05  sample/s: 33.66823061949389  loss: 0.1055 (0.1312)  time: 0.1279  data: 0.0033  max mem: 3151\n",
            "2021/05/20 18:27:08\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1500/2105]  eta: 0:01:17  lr: 8.57957244655582e-06  sample/s: 30.487732012342494  loss: 0.0543 (0.1309)  time: 0.1302  data: 0.0034  max mem: 3151\n",
            "2021/05/20 18:28:13\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [2000/2105]  eta: 0:00:13  lr: 6.996041171813144e-06  sample/s: 27.549201876546615  loss: 0.0852 (0.1290)  time: 0.1344  data: 0.0033  max mem: 3151\n",
            "2021/05/20 18:28:26\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:04:31\n",
            "2021/05/20 18:28:27\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
            "2021/05/20 18:28:27\tINFO\t__main__\tValidation: accuracy = 0.9048165137614679\n",
            "2021/05/20 18:28:27\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [   0/2105]  eta: 0:04:04  lr: 6.6634996041171816e-06  sample/s: 36.15265070054691  loss: 0.0280 (0.0280)  time: 0.1163  data: 0.0057  max mem: 3151\n",
            "2021/05/20 18:29:32\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 500/2105]  eta: 0:03:25  lr: 5.079968329374505e-06  sample/s: 30.692424774616555  loss: 0.0018 (0.1044)  time: 0.1281  data: 0.0032  max mem: 3151\n",
            "2021/05/20 18:30:36\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1000/2105]  eta: 0:02:21  lr: 3.4964370546318295e-06  sample/s: 30.540789095238267  loss: 0.0037 (0.1127)  time: 0.1298  data: 0.0032  max mem: 3151\n",
            "2021/05/20 18:31:40\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1500/2105]  eta: 0:01:17  lr: 1.9129057798891528e-06  sample/s: 33.63690970259075  loss: 0.1440 (0.1193)  time: 0.1241  data: 0.0032  max mem: 3151\n",
            "2021/05/20 18:32:44\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [2000/2105]  eta: 0:00:13  lr: 3.293745051464767e-07  sample/s: 33.58506157615746  loss: 0.0188 (0.1223)  time: 0.1277  data: 0.0034  max mem: 3151\n",
            "2021/05/20 18:32:57\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:04:30\n",
            "2021/05/20 18:32:58\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
            "2021/05/20 18:32:58\tINFO\t__main__\tValidation: accuracy = 0.9105504587155964\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"sst2\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/sst2/ce/sst2-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/sst2/ce/sst2-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/20 18:33:00\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/20 18:33:01\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/sst2/default_experiment-1-0.arrow\n",
            "2021/05/20 18:33:01\tINFO\t__main__\tTest: accuracy = 0.911697247706422\n",
            "2021/05/20 18:33:01\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/20 18:33:01\tINFO\t__main__\tsst2/test: 1821 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pjKsN2wz10Lb"
      },
      "source": [
        "### 4.3 MRPC task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NTHMMfEWpsdN",
        "outputId": "7c0ac7fa-c989-46e3-a747-e4120e49283e"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/mrpc/ce/bert_base_uncased.yaml \\\n",
        "  --task mrpc \\\n",
        "  --log log/glue/mrpc/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 18:33:06.898256: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 18:33:08\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mrpc/ce/bert_base_uncased.yaml', log='log/glue/mrpc/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mrpc', test_only=False, world_size=1)\n",
            "2021/05/20 18:33:08\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"mrpc\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/mrpc (download: 1.43 MiB, generated: 1.43 MiB, post-processed: Unknown size, total: 2.85 MiB) to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 6.22kB [00:00, 4.49MB/s]\n",
            "Downloading: 1.05MB [00:00, 2.24MB/s]\n",
            "Downloading: 441kB [00:00, 1.24MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 4/4 [00:00<00:00,  6.79ba/s]\n",
            "100% 1/1 [00:00<00:00, 18.58ba/s]\n",
            "100% 2/2 [00:00<00:00,  8.57ba/s]\n",
            "2021/05/20 18:33:15\tINFO\t__main__\tStart training\n",
            "2021/05/20 18:33:15\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 18:33:15\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 18:33:15\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/20 18:33:19\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [  0/115]  eta: 0:00:27  lr: 1.996521739130435e-05  sample/s: 17.34970418912868  loss: 0.8980 (0.8980)  time: 0.2365  data: 0.0059  max mem: 2057\n",
            "2021/05/20 18:33:29\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 50/115]  eta: 0:00:12  lr: 1.822608695652174e-05  sample/s: 21.818316121573417  loss: 0.6190 (0.6291)  time: 0.1950  data: 0.0046  max mem: 3895\n",
            "2021/05/20 18:33:39\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [100/115]  eta: 0:00:02  lr: 1.6486956521739132e-05  sample/s: 20.084031742345434  loss: 0.6250 (0.6227)  time: 0.1943  data: 0.0045  max mem: 3895\n",
            "2021/05/20 18:33:41\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:22\n",
            "2021/05/20 18:33:42\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
            "2021/05/20 18:33:42\tINFO\t__main__\tValidation: accuracy = 0.6862745098039216, f1 = 0.8134110787172011\n",
            "2021/05/20 18:33:42\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:33:43\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [  0/115]  eta: 0:00:21  lr: 1.596521739130435e-05  sample/s: 21.732454737049363  loss: 0.6689 (0.6689)  time: 0.1905  data: 0.0065  max mem: 3895\n",
            "2021/05/20 18:33:53\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 50/115]  eta: 0:00:12  lr: 1.4226086956521742e-05  sample/s: 21.13491934518748  loss: 0.5636 (0.5817)  time: 0.1988  data: 0.0045  max mem: 3895\n",
            "2021/05/20 18:34:03\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [100/115]  eta: 0:00:02  lr: 1.2486956521739131e-05  sample/s: 21.419891886923015  loss: 0.5409 (0.5654)  time: 0.2009  data: 0.0045  max mem: 3895\n",
            "2021/05/20 18:34:06\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:22\n",
            "2021/05/20 18:34:07\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
            "2021/05/20 18:34:07\tINFO\t__main__\tValidation: accuracy = 0.7573529411764706, f1 = 0.8395461912479741\n",
            "2021/05/20 18:34:07\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:34:08\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [  0/115]  eta: 0:00:22  lr: 1.196521739130435e-05  sample/s: 21.714396652477056  loss: 0.5780 (0.5780)  time: 0.1934  data: 0.0092  max mem: 3895\n",
            "2021/05/20 18:34:18\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 50/115]  eta: 0:00:12  lr: 1.022608695652174e-05  sample/s: 21.650115817659774  loss: 0.5047 (0.5288)  time: 0.1980  data: 0.0048  max mem: 3895\n",
            "2021/05/20 18:34:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [100/115]  eta: 0:00:02  lr: 8.48695652173913e-06  sample/s: 21.80776604885074  loss: 0.5116 (0.5167)  time: 0.1939  data: 0.0045  max mem: 3895\n",
            "2021/05/20 18:34:31\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:22\n",
            "2021/05/20 18:34:32\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
            "2021/05/20 18:34:32\tINFO\t__main__\tValidation: accuracy = 0.7647058823529411, f1 = 0.8421052631578948\n",
            "2021/05/20 18:34:32\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:34:33\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [  0/115]  eta: 0:00:21  lr: 7.965217391304349e-06  sample/s: 22.17703595311647  loss: 0.5962 (0.5962)  time: 0.1871  data: 0.0067  max mem: 3895\n",
            "2021/05/20 18:34:43\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [ 50/115]  eta: 0:00:12  lr: 6.226086956521739e-06  sample/s: 21.844480033332033  loss: 0.4726 (0.4944)  time: 0.1989  data: 0.0046  max mem: 3895\n",
            "2021/05/20 18:34:52\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [100/115]  eta: 0:00:02  lr: 4.486956521739131e-06  sample/s: 21.716026835004588  loss: 0.4394 (0.4757)  time: 0.1917  data: 0.0045  max mem: 3895\n",
            "2021/05/20 18:34:55\tINFO\ttorchdistill.misc.log\tEpoch: [3] Total time: 0:00:22\n",
            "2021/05/20 18:34:56\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
            "2021/05/20 18:34:56\tINFO\t__main__\tValidation: accuracy = 0.7794117647058824, f1 = 0.8500000000000001\n",
            "2021/05/20 18:34:56\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:34:57\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [  0/115]  eta: 0:00:23  lr: 3.965217391304348e-06  sample/s: 20.163759783089436  loss: 0.4471 (0.4471)  time: 0.2056  data: 0.0072  max mem: 3895\n",
            "2021/05/20 18:35:07\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [ 50/115]  eta: 0:00:12  lr: 2.2260869565217395e-06  sample/s: 17.954837908759732  loss: 0.4003 (0.4478)  time: 0.1972  data: 0.0045  max mem: 3895\n",
            "2021/05/20 18:35:17\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [100/115]  eta: 0:00:02  lr: 4.869565217391305e-07  sample/s: 21.44817896732509  loss: 0.4012 (0.4355)  time: 0.1995  data: 0.0048  max mem: 3895\n",
            "2021/05/20 18:35:20\tINFO\ttorchdistill.misc.log\tEpoch: [4] Total time: 0:00:22\n",
            "2021/05/20 18:35:21\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
            "2021/05/20 18:35:21\tINFO\t__main__\tValidation: accuracy = 0.7892156862745098, f1 = 0.8585526315789473\n",
            "2021/05/20 18:35:21\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/pytorch_model.bin\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"mrpc\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/mrpc/ce/mrpc-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/20 18:35:23\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/20 18:35:24\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mrpc/default_experiment-1-0.arrow\n",
            "2021/05/20 18:35:24\tINFO\t__main__\tTest: accuracy = 0.7892156862745098, f1 = 0.8585526315789473\n",
            "2021/05/20 18:35:24\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/20 18:35:24\tINFO\t__main__\tmrpc/test: 1725 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oCFuvFRv14Ky"
      },
      "source": [
        "### 4.4 STS-B task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tDUYbl2fpxu8",
        "outputId": "62650311-5a9e-4b98-fdbe-e96aeac6cb40"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/stsb/mse/bert_base_uncased.yaml \\\n",
        "  --task stsb \\\n",
        "  --log log/glue/stsb/mse/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 18:35:31.299454: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 18:35:33\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/stsb/mse/bert_base_uncased.yaml', log='log/glue/stsb/mse/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='stsb', test_only=False, world_size=1)\n",
            "2021/05/20 18:35:33\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"stsb\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"id2label\": {\n",
            "    \"0\": \"LABEL_0\"\n",
            "  },\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"label2id\": {\n",
            "    \"LABEL_0\": 0\n",
            "  },\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/stsb (download: 784.05 KiB, generated: 1.09 MiB, post-processed: Unknown size, total: 1.86 MiB) to /root/.cache/huggingface/datasets/glue/stsb/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 803k/803k [00:00<00:00, 1.79MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/stsb/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 6/6 [00:00<00:00, 12.37ba/s]\n",
            "100% 2/2 [00:00<00:00, 16.13ba/s]\n",
            "100% 2/2 [00:00<00:00, 19.27ba/s]\n",
            "2021/05/20 18:35:37\tINFO\t__main__\tStart training\n",
            "2021/05/20 18:35:37\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 18:35:37\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 18:35:37\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
            "  \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n",
            "2021/05/20 18:35:41\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [  0/180]  eta: 0:00:31  lr: 1.9962962962962963e-05  sample/s: 23.975397560334283  loss: 7.4612 (7.4612)  time: 0.1726  data: 0.0058  max mem: 1721\n",
            "2021/05/20 18:35:50\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 50/180]  eta: 0:00:22  lr: 1.8111111111111112e-05  sample/s: 23.352025477139613  loss: 3.4428 (5.5616)  time: 0.1773  data: 0.0039  max mem: 3906\n",
            "2021/05/20 18:35:58\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [100/180]  eta: 0:00:13  lr: 1.625925925925926e-05  sample/s: 24.93307375648323  loss: 0.8947 (3.4780)  time: 0.1695  data: 0.0037  max mem: 3908\n",
            "2021/05/20 18:36:07\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [150/180]  eta: 0:00:05  lr: 1.4407407407407407e-05  sample/s: 22.87865618092138  loss: 0.6094 (2.5537)  time: 0.1755  data: 0.0038  max mem: 3908\n",
            "2021/05/20 18:36:12\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:31\n",
            "2021/05/20 18:36:14\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
            "2021/05/20 18:36:14\tINFO\t__main__\tValidation: pearson = 0.8751215666586938, spearmanr = 0.8728861125675298\n",
            "2021/05/20 18:36:14\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:36:16\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [  0/180]  eta: 0:00:37  lr: 1.3296296296296298e-05  sample/s: 19.926191969290993  loss: 0.4989 (0.4989)  time: 0.2064  data: 0.0056  max mem: 4467\n",
            "2021/05/20 18:36:24\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 50/180]  eta: 0:00:22  lr: 1.1444444444444444e-05  sample/s: 21.830779480346436  loss: 0.4852 (0.4897)  time: 0.1731  data: 0.0037  max mem: 4467\n",
            "2021/05/20 18:36:33\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [100/180]  eta: 0:00:13  lr: 9.592592592592593e-06  sample/s: 25.520637422079165  loss: 0.4093 (0.4633)  time: 0.1695  data: 0.0039  max mem: 4467\n",
            "2021/05/20 18:36:42\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [150/180]  eta: 0:00:05  lr: 7.74074074074074e-06  sample/s: 23.278894970896552  loss: 0.3982 (0.4616)  time: 0.1793  data: 0.0039  max mem: 4467\n",
            "2021/05/20 18:36:47\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:31\n",
            "2021/05/20 18:36:49\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
            "2021/05/20 18:36:49\tINFO\t__main__\tValidation: pearson = 0.888102252887114, spearmanr = 0.885858978153714\n",
            "2021/05/20 18:36:49\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 18:36:50\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [  0/180]  eta: 0:00:31  lr: 6.62962962962963e-06  sample/s: 23.430262454105794  loss: 0.2197 (0.2197)  time: 0.1757  data: 0.0050  max mem: 4467\n",
            "2021/05/20 18:36:59\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 50/180]  eta: 0:00:22  lr: 4.777777777777778e-06  sample/s: 23.277021711713836  loss: 0.2884 (0.2957)  time: 0.1716  data: 0.0039  max mem: 4467\n",
            "2021/05/20 18:37:08\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [100/180]  eta: 0:00:13  lr: 2.9259259259259257e-06  sample/s: 25.67117389368412  loss: 0.3315 (0.2932)  time: 0.1732  data: 0.0037  max mem: 4467\n",
            "2021/05/20 18:37:17\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [150/180]  eta: 0:00:05  lr: 1.074074074074074e-06  sample/s: 23.177304023960303  loss: 0.2477 (0.2896)  time: 0.1792  data: 0.0040  max mem: 4467\n",
            "2021/05/20 18:37:22\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:31\n",
            "2021/05/20 18:37:23\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
            "2021/05/20 18:37:23\tINFO\t__main__\tValidation: pearson = 0.8895822159614317, spearmanr = 0.8868203175999708\n",
            "2021/05/20 18:37:23\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/pytorch_model.bin\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"stsb\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"id2label\": {\n",
            "    \"0\": \"LABEL_0\"\n",
            "  },\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"label2id\": {\n",
            "    \"LABEL_0\": 0\n",
            "  },\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/stsb/mse/stsb-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/20 18:37:27\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/20 18:37:28\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/stsb/default_experiment-1-0.arrow\n",
            "2021/05/20 18:37:28\tINFO\t__main__\tTest: pearson = 0.8895822159614317, spearmanr = 0.8868203175999708\n",
            "2021/05/20 18:37:28\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/20 18:37:28\tINFO\t__main__\tstsb/test: 1379 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sxS1o7i118Eq"
      },
      "source": [
        "### 4.5 QQP task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UtA-gDQYp2Hf",
        "outputId": "e0e83d8d-6143-4de0-ed75-b637663522cb"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml \\\n",
        "  --task qqp \\\n",
        "  --log log/glue/qqp/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 18:37:33.940493: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 18:37:35\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qqp/ce/bert_base_uncased.yaml', log='log/glue/qqp/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qqp', test_only=False, world_size=1)\n",
            "2021/05/20 18:37:35\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"qqp\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/qqp (download: 39.76 MiB, generated: 106.55 MiB, post-processed: Unknown size, total: 146.32 MiB) to /root/.cache/huggingface/datasets/glue/qqp/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 41.7M/41.7M [00:02<00:00, 18.5MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/qqp/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 364/364 [00:30<00:00, 11.81ba/s]\n",
            "100% 41/41 [00:03<00:00, 12.05ba/s]\n",
            "100% 391/391 [00:33<00:00, 11.78ba/s]\n",
            "2021/05/20 18:39:11\tINFO\t__main__\tStart training\n",
            "2021/05/20 18:39:11\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 18:39:11\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 18:39:11\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/20 18:39:15\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [    0/11371]  eta: 0:44:42  lr: 1.9999413713247152e-05  sample/s: 18.953558335922274  loss: 0.7289 (0.7289)  time: 0.2359  data: 0.0249  max mem: 1894\n",
            "2021/05/20 18:42:10\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 1000/11371]  eta: 0:30:13  lr: 1.941312696039633e-05  sample/s: 15.413959205948313  loss: 0.3539 (0.4556)  time: 0.1804  data: 0.0040  max mem: 4466\n",
            "2021/05/20 18:45:04\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 2000/11371]  eta: 0:27:13  lr: 1.8826840207545512e-05  sample/s: 19.229958782832753  loss: 0.2990 (0.4011)  time: 0.1769  data: 0.0040  max mem: 4466\n",
            "2021/05/20 18:47:58\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 3000/11371]  eta: 0:24:18  lr: 1.824055345469469e-05  sample/s: 25.509811047064137  loss: 0.3189 (0.3730)  time: 0.1652  data: 0.0038  max mem: 4466\n",
            "2021/05/20 18:50:52\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 4000/11371]  eta: 0:21:23  lr: 1.7654266701843875e-05  sample/s: 23.188035310264  loss: 0.2307 (0.3564)  time: 0.1687  data: 0.0037  max mem: 4466\n",
            "2021/05/20 18:53:46\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 5000/11371]  eta: 0:18:29  lr: 1.7067979948993053e-05  sample/s: 19.944384213029007  loss: 0.2464 (0.3439)  time: 0.1844  data: 0.0039  max mem: 4466\n",
            "2021/05/20 18:56:41\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 6000/11371]  eta: 0:15:35  lr: 1.6481693196142235e-05  sample/s: 19.065104841385644  loss: 0.2434 (0.3341)  time: 0.1857  data: 0.0039  max mem: 4466\n",
            "2021/05/20 18:59:35\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 7000/11371]  eta: 0:12:41  lr: 1.5895406443291413e-05  sample/s: 27.550197136471642  loss: 0.2317 (0.3257)  time: 0.1771  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:02:29\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 8000/11371]  eta: 0:09:47  lr: 1.5309119690440595e-05  sample/s: 27.452463195650076  loss: 0.2374 (0.3189)  time: 0.1720  data: 0.0039  max mem: 4466\n",
            "2021/05/20 19:05:24\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 9000/11371]  eta: 0:06:53  lr: 1.4722832937589777e-05  sample/s: 27.55974623742111  loss: 0.2524 (0.3134)  time: 0.1816  data: 0.0041  max mem: 4466\n",
            "2021/05/20 19:08:18\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [10000/11371]  eta: 0:03:58  lr: 1.4136546184738957e-05  sample/s: 25.60730677240973  loss: 0.2406 (0.3083)  time: 0.1737  data: 0.0041  max mem: 4466\n",
            "2021/05/20 19:11:13\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [11000/11371]  eta: 0:01:04  lr: 1.3550259431888138e-05  sample/s: 25.47672926559266  loss: 0.2542 (0.3037)  time: 0.1745  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:12:17\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:33:01\n",
            "2021/05/20 19:13:21\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
            "2021/05/20 19:13:21\tINFO\t__main__\tValidation: accuracy = 0.8906505070492209, f1 = 0.8568282651640273\n",
            "2021/05/20 19:13:21\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 19:13:22\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [    0/11371]  eta: 0:34:11  lr: 1.3332747046580484e-05  sample/s: 25.874873148899287  loss: 0.2374 (0.2374)  time: 0.1804  data: 0.0258  max mem: 4466\n",
            "2021/05/20 19:16:16\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 1000/11371]  eta: 0:30:03  lr: 1.2746460293729664e-05  sample/s: 25.403358099913238  loss: 0.1761 (0.2012)  time: 0.1699  data: 0.0041  max mem: 4466\n",
            "2021/05/20 19:19:12\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 2000/11371]  eta: 0:27:20  lr: 1.2160173540878845e-05  sample/s: 20.1367268188174  loss: 0.1889 (0.1964)  time: 0.1785  data: 0.0040  max mem: 4466\n",
            "2021/05/20 19:22:07\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 3000/11371]  eta: 0:24:22  lr: 1.1573886788028025e-05  sample/s: 20.012854249321858  loss: 0.1642 (0.1948)  time: 0.1750  data: 0.0039  max mem: 4466\n",
            "2021/05/20 19:25:00\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 4000/11371]  eta: 0:21:26  lr: 1.0987600035177207e-05  sample/s: 21.822289223229117  loss: 0.1441 (0.1942)  time: 0.1738  data: 0.0040  max mem: 4466\n",
            "2021/05/20 19:27:56\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 5000/11371]  eta: 0:18:33  lr: 1.0401313282326387e-05  sample/s: 21.67187368807216  loss: 0.1709 (0.1951)  time: 0.1916  data: 0.0039  max mem: 4466\n",
            "2021/05/20 19:30:50\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 6000/11371]  eta: 0:15:37  lr: 9.815026529475568e-06  sample/s: 23.474158749702678  loss: 0.2001 (0.1952)  time: 0.1713  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:33:45\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 7000/11371]  eta: 0:12:43  lr: 9.228739776624748e-06  sample/s: 20.127474362604193  loss: 0.2070 (0.1945)  time: 0.1735  data: 0.0039  max mem: 4466\n",
            "2021/05/20 19:36:38\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 8000/11371]  eta: 0:09:48  lr: 8.642453023773928e-06  sample/s: 25.6112549460212  loss: 0.1906 (0.1941)  time: 0.1829  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:39:33\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 9000/11371]  eta: 0:06:53  lr: 8.05616627092311e-06  sample/s: 25.53376556174474  loss: 0.1602 (0.1939)  time: 0.1743  data: 0.0039  max mem: 4466\n",
            "2021/05/20 19:42:27\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [10000/11371]  eta: 0:03:59  lr: 7.469879518072289e-06  sample/s: 16.175784168819135  loss: 0.1215 (0.1931)  time: 0.1689  data: 0.0037  max mem: 4466\n",
            "2021/05/20 19:45:20\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [11000/11371]  eta: 0:01:04  lr: 6.883592765221471e-06  sample/s: 25.440067932310306  loss: 0.2038 (0.1924)  time: 0.1755  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:46:25\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:33:03\n",
            "2021/05/20 19:47:29\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
            "2021/05/20 19:47:29\tINFO\t__main__\tValidation: accuracy = 0.8938164729161514, f1 = 0.8529643456519505\n",
            "2021/05/20 19:47:29\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [    0/11371]  eta: 0:31:20  lr: 6.666080379913816e-06  sample/s: 28.10809506736675  loss: 0.1162 (0.1162)  time: 0.1653  data: 0.0230  max mem: 4466\n",
            "2021/05/20 19:50:25\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 1000/11371]  eta: 0:30:16  lr: 6.079793627062997e-06  sample/s: 19.35273619043115  loss: 0.0945 (0.1354)  time: 0.1827  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:53:20\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 2000/11371]  eta: 0:27:20  lr: 5.493506874212178e-06  sample/s: 23.53649814749087  loss: 0.0769 (0.1352)  time: 0.1800  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:56:13\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 3000/11371]  eta: 0:24:20  lr: 4.9072201213613585e-06  sample/s: 23.28303012724525  loss: 0.0880 (0.1341)  time: 0.1758  data: 0.0038  max mem: 4466\n",
            "2021/05/20 19:59:06\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 4000/11371]  eta: 0:21:22  lr: 4.3209333685105384e-06  sample/s: 21.845674672000083  loss: 0.0814 (0.1324)  time: 0.1704  data: 0.0037  max mem: 4466\n",
            "2021/05/20 20:02:00\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 5000/11371]  eta: 0:18:29  lr: 3.7346466156597192e-06  sample/s: 23.323003522663768  loss: 0.0914 (0.1321)  time: 0.1920  data: 0.0038  max mem: 4466\n",
            "2021/05/20 20:04:54\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 6000/11371]  eta: 0:15:35  lr: 3.1483598628089e-06  sample/s: 25.52378228465109  loss: 0.1056 (0.1315)  time: 0.1734  data: 0.0038  max mem: 4466\n",
            "2021/05/20 20:07:47\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 7000/11371]  eta: 0:12:40  lr: 2.562073109958081e-06  sample/s: 21.64251925639547  loss: 0.0797 (0.1312)  time: 0.1825  data: 0.0039  max mem: 4466\n",
            "2021/05/20 20:10:42\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 8000/11371]  eta: 0:09:46  lr: 1.9757863571072612e-06  sample/s: 25.56271064548175  loss: 0.0827 (0.1310)  time: 0.1719  data: 0.0039  max mem: 4466\n",
            "2021/05/20 20:13:37\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 9000/11371]  eta: 0:06:53  lr: 1.3894996042564418e-06  sample/s: 27.391775552620366  loss: 0.1054 (0.1310)  time: 0.1694  data: 0.0038  max mem: 4466\n",
            "2021/05/20 20:16:32\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [10000/11371]  eta: 0:03:58  lr: 8.032128514056225e-07  sample/s: 27.23238042059746  loss: 0.0959 (0.1307)  time: 0.1694  data: 0.0040  max mem: 4466\n",
            "2021/05/20 20:19:27\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [11000/11371]  eta: 0:01:04  lr: 2.169260985548032e-07  sample/s: 16.004502609510237  loss: 0.0783 (0.1309)  time: 0.1712  data: 0.0038  max mem: 4466\n",
            "2021/05/20 20:20:31\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:33:01\n",
            "2021/05/20 20:21:35\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
            "2021/05/20 20:21:35\tINFO\t__main__\tValidation: accuracy = 0.8978233984664853, f1 = 0.8634335019339482\n",
            "2021/05/20 20:21:35\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased/pytorch_model.bin\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"qqp\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/qqp/ce/qqp-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/20 20:21:38\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/20 20:22:42\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qqp/default_experiment-1-0.arrow\n",
            "2021/05/20 20:22:42\tINFO\t__main__\tTest: accuracy = 0.8978233984664853, f1 = 0.8634335019339482\n",
            "2021/05/20 20:22:42\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/20 20:22:42\tINFO\t__main__\tqqp/test: 390965 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nGATNCSI1_vr"
      },
      "source": [
        "### 4.6 MNLI task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "RMnPiXycp8-B",
        "outputId": "dda279e5-82a5-4d9d-ed34-18b03778ce18"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml \\\n",
        "  --task mnli \\\n",
        "  --log log/glue/mnli/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 22:16:41.333339: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 22:16:43\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml', log='log/glue/mnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)\n",
            "2021/05/20 22:16:43\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "2021/05/20 22:16:43\tINFO\tfilelock\tLock 140636129258704 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp3se8_u_z\n",
            "Downloading: 100% 570/570 [00:00<00:00, 604kB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "creating metadata file for /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "2021/05/20 22:16:44\tINFO\tfilelock\tLock 140636129258704 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"mnli\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"id2label\": {\n",
            "    \"0\": \"LABEL_0\",\n",
            "    \"1\": \"LABEL_1\",\n",
            "    \"2\": \"LABEL_2\"\n",
            "  },\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"label2id\": {\n",
            "    \"LABEL_0\": 0,\n",
            "    \"LABEL_1\": 1,\n",
            "    \"LABEL_2\": 2\n",
            "  },\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "2021/05/20 22:16:44\tINFO\tfilelock\tLock 140636129258960 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp4msvjc70\n",
            "Downloading: 100% 232k/232k [00:00<00:00, 967kB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "creating metadata file for /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "2021/05/20 22:16:45\tINFO\tfilelock\tLock 140636129258960 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock\n",
            "2021/05/20 22:16:45\tINFO\tfilelock\tLock 140636129390032 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpxtjvb7z9\n",
            "Downloading: 100% 466k/466k [00:00<00:00, 1.54MB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "creating metadata file for /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "2021/05/20 22:16:45\tINFO\tfilelock\tLock 140636129390032 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock\n",
            "2021/05/20 22:16:46\tINFO\tfilelock\tLock 140636129446608 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp1wb5mruf\n",
            "Downloading: 100% 28.0/28.0 [00:00<00:00, 35.6kB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "creating metadata file for /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "2021/05/20 22:16:46\tINFO\tfilelock\tLock 140636129446608 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "2021/05/20 22:16:46\tINFO\tfilelock\tLock 140636129390480 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
            "https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpsbns0vzr\n",
            "Downloading: 100% 440M/440M [00:10<00:00, 43.2MB/s]\n",
            "storing https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "creating metadata file for /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "2021/05/20 22:16:57\tINFO\tfilelock\tLock 140636129390480 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading: 28.8kB [00:00, 24.0MB/s]       \n",
            "Downloading: 28.7kB [00:00, 28.7MB/s]       \n",
            "Downloading and preparing dataset glue/mnli (download: 298.29 MiB, generated: 78.65 MiB, post-processed: Unknown size, total: 376.95 MiB) to /root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 313M/313M [00:08<00:00, 38.2MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 393/393 [00:41<00:00,  9.41ba/s]\n",
            "100% 10/10 [00:01<00:00,  8.41ba/s]\n",
            "100% 10/10 [00:01<00:00,  9.24ba/s]\n",
            "100% 10/10 [00:01<00:00,  8.71ba/s]\n",
            "100% 10/10 [00:01<00:00,  9.42ba/s]\n",
            "Downloading and preparing dataset glue/ax (download: 217.05 KiB, generated: 232.80 KiB, post-processed: Unknown size, total: 449.85 KiB) to /root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 222kB [00:00, 2.74MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 2/2 [00:00<00:00, 15.20ba/s]\n",
            "Downloading: 5.75kB [00:00, 5.76MB/s]       \n",
            "2021/05/20 22:18:24\tINFO\t__main__\tStart training\n",
            "2021/05/20 22:18:24\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 22:18:24\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 22:18:24\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/20 22:18:35\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [    0/12272]  eta: 1:21:07  lr: 1.9999456757931336e-05  sample/s: 11.236528662591018  loss: 1.1525 (1.1525)  time: 0.3967  data: 0.0407  max mem: 2411\n",
            "2021/05/20 22:22:00\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 1000/12272]  eta: 0:38:33  lr: 1.945621468926554e-05  sample/s: 20.141392507791377  loss: 0.6407 (0.8781)  time: 0.2034  data: 0.0043  max mem: 4434\n",
            "2021/05/20 22:25:30\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 2000/12272]  eta: 0:35:31  lr: 1.891297262059974e-05  sample/s: 19.208426548206496  loss: 0.5172 (0.7428)  time: 0.2071  data: 0.0043  max mem: 4434\n",
            "2021/05/20 22:29:02\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 3000/12272]  eta: 0:32:16  lr: 1.8369730551933943e-05  sample/s: 19.869789828485075  loss: 0.4867 (0.6812)  time: 0.2018  data: 0.0047  max mem: 4434\n",
            "2021/05/20 22:32:31\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 4000/12272]  eta: 0:28:48  lr: 1.7826488483268146e-05  sample/s: 21.650283448979827  loss: 0.5033 (0.6411)  time: 0.2123  data: 0.0044  max mem: 4434\n",
            "2021/05/20 22:36:01\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 5000/12272]  eta: 0:25:21  lr: 1.728324641460235e-05  sample/s: 25.314088617029945  loss: 0.4999 (0.6159)  time: 0.2165  data: 0.0045  max mem: 4434\n",
            "2021/05/20 22:39:32\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 6000/12272]  eta: 0:21:53  lr: 1.674000434593655e-05  sample/s: 23.060419252460026  loss: 0.4741 (0.5967)  time: 0.2123  data: 0.0043  max mem: 4434\n",
            "2021/05/20 22:43:01\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 7000/12272]  eta: 0:18:23  lr: 1.6196762277270753e-05  sample/s: 23.32063690637532  loss: 0.4777 (0.5805)  time: 0.2062  data: 0.0045  max mem: 4434\n",
            "2021/05/20 22:46:31\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 8000/12272]  eta: 0:14:54  lr: 1.5653520208604957e-05  sample/s: 18.13110162018225  loss: 0.5661 (0.5679)  time: 0.2189  data: 0.0047  max mem: 4434\n",
            "2021/05/20 22:50:01\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 9000/12272]  eta: 0:11:25  lr: 1.5110278139939158e-05  sample/s: 24.956365226652704  loss: 0.4063 (0.5573)  time: 0.2140  data: 0.0044  max mem: 4434\n",
            "2021/05/20 22:53:30\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [10000/12272]  eta: 0:07:56  lr: 1.4567036071273362e-05  sample/s: 20.236602485003974  loss: 0.4805 (0.5484)  time: 0.1967  data: 0.0042  max mem: 4434\n",
            "2021/05/20 22:56:59\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [11000/12272]  eta: 0:04:26  lr: 1.4023794002607562e-05  sample/s: 20.026757705254614  loss: 0.4752 (0.5397)  time: 0.2029  data: 0.0046  max mem: 4434\n",
            "2021/05/20 23:00:28\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [12000/12272]  eta: 0:00:56  lr: 1.3480551933941765e-05  sample/s: 20.192736630021642  loss: 0.4827 (0.5330)  time: 0.2086  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:01:25\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:42:49\n",
            "2021/05/20 23:01:45\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
            "2021/05/20 23:01:45\tINFO\t__main__\tValidation: accuracy = 0.8271013754457464\n",
            "2021/05/20 23:01:45\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 23:01:46\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [    0/12272]  eta: 0:51:40  lr: 1.333279009126467e-05  sample/s: 18.308422617163885  loss: 0.3718 (0.3718)  time: 0.2526  data: 0.0342  max mem: 4434\n",
            "2021/05/20 23:05:14\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 1000/12272]  eta: 0:39:08  lr: 1.2789548022598873e-05  sample/s: 20.30181391352751  loss: 0.3042 (0.3659)  time: 0.2081  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:08:45\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 2000/12272]  eta: 0:35:50  lr: 1.2246305953933073e-05  sample/s: 22.79649326861932  loss: 0.3443 (0.3650)  time: 0.2106  data: 0.0045  max mem: 4434\n",
            "2021/05/20 23:12:15\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 3000/12272]  eta: 0:32:23  lr: 1.1703063885267276e-05  sample/s: 19.32019848522013  loss: 0.3010 (0.3643)  time: 0.2184  data: 0.0045  max mem: 4434\n",
            "2021/05/20 23:15:42\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 4000/12272]  eta: 0:28:49  lr: 1.115982181660148e-05  sample/s: 17.966124273820043  loss: 0.3114 (0.3621)  time: 0.2234  data: 0.0045  max mem: 4434\n",
            "2021/05/20 23:19:13\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 5000/12272]  eta: 0:25:22  lr: 1.061657974793568e-05  sample/s: 18.109944959515722  loss: 0.3884 (0.3624)  time: 0.2110  data: 0.0044  max mem: 4434\n",
            "2021/05/20 23:22:43\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 6000/12272]  eta: 0:21:53  lr: 1.0073337679269883e-05  sample/s: 21.846698752005334  loss: 0.3534 (0.3619)  time: 0.2216  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:26:13\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 7000/12272]  eta: 0:18:24  lr: 9.530095610604087e-06  sample/s: 19.275316263001223  loss: 0.3284 (0.3607)  time: 0.2063  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:29:43\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 8000/12272]  eta: 0:14:55  lr: 8.986853541938288e-06  sample/s: 15.418761441920992  loss: 0.3870 (0.3595)  time: 0.2104  data: 0.0044  max mem: 4434\n",
            "2021/05/20 23:33:12\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 9000/12272]  eta: 0:11:25  lr: 8.44361147327249e-06  sample/s: 21.981170095407297  loss: 0.3105 (0.3588)  time: 0.2106  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:36:41\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [10000/12272]  eta: 0:07:55  lr: 7.900369404606693e-06  sample/s: 16.91985033835231  loss: 0.3401 (0.3584)  time: 0.2160  data: 0.0044  max mem: 4434\n",
            "2021/05/20 23:40:11\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [11000/12272]  eta: 0:04:26  lr: 7.357127335940896e-06  sample/s: 20.137935339058835  loss: 0.3152 (0.3581)  time: 0.2176  data: 0.0044  max mem: 4434\n",
            "2021/05/20 23:43:40\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [12000/12272]  eta: 0:00:56  lr: 6.813885267275099e-06  sample/s: 15.460406773850805  loss: 0.3395 (0.3576)  time: 0.2184  data: 0.0045  max mem: 4434\n",
            "2021/05/20 23:44:37\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:42:51\n",
            "2021/05/20 23:44:57\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
            "2021/05/20 23:44:57\tINFO\t__main__\tValidation: accuracy = 0.8309730005094244\n",
            "2021/05/20 23:44:57\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 23:44:59\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [    0/12272]  eta: 0:56:14  lr: 6.666123424598001e-06  sample/s: 16.109354385870578  loss: 0.2955 (0.2955)  time: 0.2750  data: 0.0267  max mem: 4434\n",
            "2021/05/20 23:48:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 1000/12272]  eta: 0:39:15  lr: 6.122881355932204e-06  sample/s: 15.522932044907392  loss: 0.2688 (0.2692)  time: 0.2076  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:51:58\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 2000/12272]  eta: 0:35:52  lr: 5.579639287266406e-06  sample/s: 16.84523706727767  loss: 0.2549 (0.2666)  time: 0.2115  data: 0.0045  max mem: 4434\n",
            "2021/05/20 23:55:28\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 3000/12272]  eta: 0:32:26  lr: 5.0363972186006095e-06  sample/s: 25.638221041805792  loss: 0.2355 (0.2671)  time: 0.2106  data: 0.0043  max mem: 4434\n",
            "2021/05/20 23:58:56\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 4000/12272]  eta: 0:28:52  lr: 4.493155149934811e-06  sample/s: 16.007694127884342  loss: 0.2944 (0.2668)  time: 0.2013  data: 0.0046  max mem: 4434\n",
            "2021/05/21 00:02:27\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 5000/12272]  eta: 0:25:24  lr: 3.949913081269014e-06  sample/s: 23.148075686624043  loss: 0.2532 (0.2678)  time: 0.2010  data: 0.0045  max mem: 4434\n",
            "2021/05/21 00:05:56\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 6000/12272]  eta: 0:21:54  lr: 3.4066710126032164e-06  sample/s: 19.44152090831767  loss: 0.2265 (0.2667)  time: 0.2127  data: 0.0045  max mem: 4434\n",
            "2021/05/21 00:09:24\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 7000/12272]  eta: 0:18:23  lr: 2.8634289439374186e-06  sample/s: 18.423382031256864  loss: 0.2379 (0.2654)  time: 0.2204  data: 0.0045  max mem: 4434\n",
            "2021/05/21 00:12:54\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 8000/12272]  eta: 0:14:54  lr: 2.320186875271621e-06  sample/s: 18.03493639426312  loss: 0.2019 (0.2649)  time: 0.2058  data: 0.0044  max mem: 4434\n",
            "2021/05/21 00:16:22\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 9000/12272]  eta: 0:11:24  lr: 1.7769448066058238e-06  sample/s: 21.65959105896569  loss: 0.2636 (0.2654)  time: 0.2113  data: 0.0045  max mem: 4434\n",
            "2021/05/21 00:19:54\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [10000/12272]  eta: 0:07:55  lr: 1.2337027379400262e-06  sample/s: 23.077833900058323  loss: 0.2346 (0.2649)  time: 0.2122  data: 0.0044  max mem: 4434\n",
            "2021/05/21 00:23:24\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [11000/12272]  eta: 0:04:26  lr: 6.904606692742287e-07  sample/s: 15.398906110745502  loss: 0.2577 (0.2645)  time: 0.1962  data: 0.0045  max mem: 4434\n",
            "2021/05/21 00:26:53\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [12000/12272]  eta: 0:00:56  lr: 1.4721860060843112e-07  sample/s: 15.460620481072802  loss: 0.1862 (0.2637)  time: 0.2069  data: 0.0043  max mem: 4434\n",
            "2021/05/21 00:27:51\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:42:52\n",
            "2021/05/21 00:28:11\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:28:11\tINFO\t__main__\tValidation: accuracy = 0.8315843097300051\n",
            "2021/05/21 00:28:11\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/pytorch_model.bin\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"mnli\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"id2label\": {\n",
            "    \"0\": \"LABEL_0\",\n",
            "    \"1\": \"LABEL_1\",\n",
            "    \"2\": \"LABEL_2\"\n",
            "  },\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"label2id\": {\n",
            "    \"LABEL_0\": 0,\n",
            "    \"LABEL_1\": 1,\n",
            "    \"LABEL_2\": 2\n",
            "  },\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/21 00:28:15\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/21 00:28:35\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:28:35\tINFO\t__main__\tTest: accuracy = 0.8315843097300051\n",
            "2021/05/21 00:28:35\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/21 00:28:35\tINFO\t__main__\tmnli/test_m: 9796 samples\n",
            "2021/05/21 00:28:55\tINFO\t__main__\tmnli/test_mm: 9847 samples\n",
            "2021/05/21 00:29:15\tINFO\t__main__\tax/test_ax: 1104 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dNMwSfQx2DN_"
      },
      "source": [
        "### 4.7 QNLI task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "background_save": true,
          "base_uri": "https://localhost:8080/"
        },
        "id": "tm6AEL9cqAnd",
        "outputId": "2837a205-1442-401a-a90e-d1a4f593ec37"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml \\\n",
        "  --task qnli \\\n",
        "  --log log/glue/qnli/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-20 21:27:21.185270: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/20 21:27:23\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml', log='log/glue/qnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)\n",
            "2021/05/20 21:27:23\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"qnli\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/qnli (download: 10.14 MiB, generated: 27.11 MiB, post-processed: Unknown size, total: 37.24 MiB) to /root/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 10.6M/10.6M [00:00<00:00, 11.3MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 105/105 [00:13<00:00,  7.57ba/s]\n",
            "100% 6/6 [00:00<00:00,  7.26ba/s]\n",
            "100% 6/6 [00:00<00:00,  8.21ba/s]\n",
            "2021/05/20 21:27:46\tINFO\t__main__\tStart training\n",
            "2021/05/20 21:27:46\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/20 21:27:46\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/20 21:27:46\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/20 21:27:50\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [   0/3274]  eta: 0:16:44  lr: 1.9997963754836084e-05  sample/s: 13.456997837539543  loss: 0.7080 (0.7080)  time: 0.3070  data: 0.0097  max mem: 2795\n",
            "2021/05/20 21:29:40\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 500/3274]  eta: 0:10:11  lr: 1.8979841172877215e-05  sample/s: 16.796649723730354  loss: 0.4036 (0.5075)  time: 0.2243  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:31:32\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1000/3274]  eta: 0:08:25  lr: 1.796171859091835e-05  sample/s: 17.90282000198479  loss: 0.3010 (0.4388)  time: 0.2216  data: 0.0045  max mem: 4474\n",
            "2021/05/20 21:33:23\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [1500/3274]  eta: 0:06:34  lr: 1.694359600895948e-05  sample/s: 20.148624970426567  loss: 0.3361 (0.4038)  time: 0.2177  data: 0.0043  max mem: 4474\n",
            "2021/05/20 21:35:14\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [2000/3274]  eta: 0:04:42  lr: 1.5925473427000613e-05  sample/s: 17.9254485311087  loss: 0.2727 (0.3794)  time: 0.2230  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:37:05\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [2500/3274]  eta: 0:02:51  lr: 1.4907350845041744e-05  sample/s: 22.980913636052325  loss: 0.2204 (0.3644)  time: 0.2191  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:38:56\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [3000/3274]  eta: 0:01:00  lr: 1.3889228263082878e-05  sample/s: 19.11297437206081  loss: 0.2908 (0.3518)  time: 0.2141  data: 0.0043  max mem: 4474\n",
            "2021/05/20 21:39:56\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:12:06\n",
            "2021/05/20 21:40:09\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
            "2021/05/20 21:40:09\tINFO\t__main__\tValidation: accuracy = 0.8991396668497162\n",
            "2021/05/20 21:40:09\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 21:40:11\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [   0/3274]  eta: 0:11:13  lr: 1.3331297088169417e-05  sample/s: 20.445632768161914  loss: 0.2224 (0.2224)  time: 0.2056  data: 0.0099  max mem: 4474\n",
            "2021/05/20 21:42:03\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 500/3274]  eta: 0:10:20  lr: 1.2313174506210548e-05  sample/s: 16.70393583349844  loss: 0.1787 (0.2109)  time: 0.2262  data: 0.0045  max mem: 4474\n",
            "2021/05/20 21:43:53\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1000/3274]  eta: 0:08:24  lr: 1.1295051924251682e-05  sample/s: 19.163755074440356  loss: 0.1620 (0.2087)  time: 0.2288  data: 0.0046  max mem: 4474\n",
            "2021/05/20 21:45:44\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [1500/3274]  eta: 0:06:33  lr: 1.0276929342292811e-05  sample/s: 20.12819879114788  loss: 0.1297 (0.2058)  time: 0.2050  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:47:35\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [2000/3274]  eta: 0:04:42  lr: 9.258806760333945e-06  sample/s: 20.13675098780072  loss: 0.1726 (0.2064)  time: 0.2218  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:49:26\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [2500/3274]  eta: 0:02:51  lr: 8.240684178375076e-06  sample/s: 20.10665727081738  loss: 0.2318 (0.2065)  time: 0.2101  data: 0.0045  max mem: 4474\n",
            "2021/05/20 21:51:17\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [3000/3274]  eta: 0:01:00  lr: 7.222561596416208e-06  sample/s: 20.097143527957392  loss: 0.2641 (0.2061)  time: 0.2183  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:52:18\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:12:06\n",
            "2021/05/20 21:52:30\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
            "2021/05/20 21:52:30\tINFO\t__main__\tValidation: accuracy = 0.90591250228812\n",
            "2021/05/20 21:52:30\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/20 21:52:32\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [   0/3274]  eta: 0:12:27  lr: 6.6646304215027494e-06  sample/s: 18.23105117603273  loss: 0.4684 (0.4684)  time: 0.2284  data: 0.0090  max mem: 4474\n",
            "2021/05/20 21:54:23\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 500/3274]  eta: 0:10:17  lr: 5.646507839543881e-06  sample/s: 19.454867190960684  loss: 0.0978 (0.1246)  time: 0.2282  data: 0.0045  max mem: 4474\n",
            "2021/05/20 21:56:14\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1000/3274]  eta: 0:08:24  lr: 4.628385257585013e-06  sample/s: 21.82174993171442  loss: 0.1097 (0.1288)  time: 0.2095  data: 0.0044  max mem: 4474\n",
            "2021/05/20 21:58:05\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [1500/3274]  eta: 0:06:33  lr: 3.6102626756261456e-06  sample/s: 15.352714394871818  loss: 0.1166 (0.1326)  time: 0.2186  data: 0.0045  max mem: 4474\n",
            "2021/05/20 21:59:56\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [2000/3274]  eta: 0:04:42  lr: 2.5921400936672775e-06  sample/s: 19.22350285593157  loss: 0.0571 (0.1322)  time: 0.2184  data: 0.0044  max mem: 4474\n",
            "2021/05/20 22:01:47\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [2500/3274]  eta: 0:02:51  lr: 1.5740175117084096e-06  sample/s: 15.970802280453082  loss: 0.1189 (0.1323)  time: 0.2379  data: 0.0044  max mem: 4474\n",
            "2021/05/20 22:03:38\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [3000/3274]  eta: 0:01:00  lr: 5.558949297495419e-07  sample/s: 19.252224184484813  loss: 0.0675 (0.1320)  time: 0.2290  data: 0.0044  max mem: 4474\n",
            "2021/05/20 22:04:39\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:12:07\n",
            "2021/05/20 22:04:52\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
            "2021/05/20 22:04:52\tINFO\t__main__\tValidation: accuracy = 0.9055464030752334\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"qnli\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/20 22:04:53\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/20 22:05:06\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow\n",
            "2021/05/20 22:05:06\tINFO\t__main__\tTest: accuracy = 0.90591250228812\n",
            "2021/05/20 22:05:06\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/20 22:05:06\tINFO\t__main__\tqnli/test: 5463 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b99U7uAX2HI5"
      },
      "source": [
        "### 4.8 RTE task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "m-iYN-RSqEwF",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "eae668a0-1836-49ff-8630-3bab3316262b"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/rte/ce/bert_base_uncased.yaml \\\n",
        "  --task rte \\\n",
        "  --log log/glue/rte/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-21 00:29:21.369691: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/21 00:29:23\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/rte/ce/bert_base_uncased.yaml', log='log/glue/rte/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='rte', test_only=False, world_size=1)\n",
            "2021/05/21 00:29:23\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"rte\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/rte (download: 680.81 KiB, generated: 1.83 MiB, post-processed: Unknown size, total: 2.49 MiB) to /root/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 697k/697k [00:00<00:00, 5.75MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/rte/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 3/3 [00:00<00:00,  7.10ba/s]\n",
            "100% 1/1 [00:00<00:00, 22.82ba/s]\n",
            "100% 3/3 [00:00<00:00,  6.53ba/s]\n",
            "2021/05/21 00:29:38\tINFO\t__main__\tStart training\n",
            "2021/05/21 00:29:38\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/21 00:29:38\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/21 00:29:38\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/21 00:29:42\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 0/78]  eta: 0:00:25  lr: 1.9914529914529916e-05  sample/s: 12.731443384242262  loss: 0.6580 (0.6580)  time: 0.3207  data: 0.0065  max mem: 3175\n",
            "2021/05/21 00:29:55\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [50/78]  eta: 0:00:07  lr: 1.5641025641025644e-05  sample/s: 15.345538628148278  loss: 0.6907 (0.6965)  time: 0.2646  data: 0.0050  max mem: 4462\n",
            "2021/05/21 00:30:02\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:20\n",
            "2021/05/21 00:30:03\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
            "2021/05/21 00:30:03\tINFO\t__main__\tValidation: accuracy = 0.5703971119133574\n",
            "2021/05/21 00:30:03\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/21 00:30:04\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 0/78]  eta: 0:00:20  lr: 1.3247863247863248e-05  sample/s: 15.430077641436725  loss: 0.6740 (0.6740)  time: 0.2651  data: 0.0059  max mem: 4462\n",
            "2021/05/21 00:30:18\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [50/78]  eta: 0:00:07  lr: 8.974358974358976e-06  sample/s: 14.827031388501753  loss: 0.6852 (0.6872)  time: 0.2714  data: 0.0053  max mem: 4462\n",
            "2021/05/21 00:30:25\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:21\n",
            "2021/05/21 00:30:26\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
            "2021/05/21 00:30:26\tINFO\t__main__\tValidation: accuracy = 0.5848375451263538\n",
            "2021/05/21 00:30:26\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/21 00:30:27\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 0/78]  eta: 0:00:20  lr: 6.581196581196582e-06  sample/s: 15.358659491999905  loss: 0.6655 (0.6655)  time: 0.2659  data: 0.0055  max mem: 4462\n",
            "2021/05/21 00:30:41\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [50/78]  eta: 0:00:07  lr: 2.307692307692308e-06  sample/s: 15.157076132226328  loss: 0.6805 (0.6815)  time: 0.2705  data: 0.0050  max mem: 4462\n",
            "2021/05/21 00:30:48\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:21\n",
            "2021/05/21 00:30:49\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
            "2021/05/21 00:30:49\tINFO\t__main__\tValidation: accuracy = 0.592057761732852\n",
            "2021/05/21 00:30:49\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/pytorch_model.bin\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"rte\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/rte/ce/rte-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/21 00:30:53\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/21 00:30:54\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/rte/default_experiment-1-0.arrow\n",
            "2021/05/21 00:30:54\tINFO\t__main__\tTest: accuracy = 0.592057761732852\n",
            "2021/05/21 00:30:54\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/21 00:30:54\tINFO\t__main__\trte/test: 3000 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TUjMUFiy2LFP"
      },
      "source": [
        "### 4.9 WNLI task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8pHlvCY0qIVE",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "520a422e-4b99-431a-e4b8-74995856e07a"
      },
      "source": [
        "!accelerate launch torchdistill/examples/hf_transformers/text_classification.py \\\n",
        "  --config torchdistill/configs/sample/glue/wnli/ce/bert_base_uncased.yaml \\\n",
        "  --task wnli \\\n",
        "  --log log/glue/wnli/ce/bert_base_uncased.txt \\\n",
        "  --private_output leaderboard/glue/standard/bert_base_uncased/"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "2021-05-21 00:31:03.998542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0\n",
            "2021/05/21 00:31:05\tINFO\t__main__\tNamespace(adjust_lr=False, config='torchdistill/configs/sample/glue/wnli/ce/bert_base_uncased.yaml', log='log/glue/wnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='wnli', test_only=False, world_size=1)\n",
            "2021/05/21 00:31:05\tINFO\t__main__\tDistributed environment: NO\n",
            "Num processes: 1\n",
            "Process index: 0\n",
            "Local process index: 0\n",
            "Device: cuda\n",
            "Use FP16 precision: True\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"wnli\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f\n",
            "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias']\n",
            "- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
            "Downloading and preparing dataset glue/wnli (download: 28.32 KiB, generated: 154.03 KiB, post-processed: Unknown size, total: 182.35 KiB) to /root/.cache/huggingface/datasets/glue/wnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...\n",
            "Downloading: 100% 29.0k/29.0k [00:00<00:00, 1.09MB/s]\n",
            "Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/wnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.\n",
            "100% 1/1 [00:00<00:00, 15.31ba/s]\n",
            "100% 1/1 [00:00<00:00, 131.46ba/s]\n",
            "100% 1/1 [00:00<00:00, 50.33ba/s]\n",
            "2021/05/21 00:31:10\tINFO\t__main__\tStart training\n",
            "2021/05/21 00:31:10\tINFO\ttorchdistill.models.util\t[student model]\n",
            "2021/05/21 00:31:10\tINFO\ttorchdistill.models.util\tUsing the original student model\n",
            "2021/05/21 00:31:10\tINFO\ttorchdistill.core.training\tLoss = 1.0 * OrgLoss\n",
            "2021/05/21 00:31:14\tINFO\ttorchdistill.misc.log\tEpoch: [0]  [ 0/20]  eta: 0:00:04  lr: 1.98e-05  sample/s: 16.493965136707324  loss: 0.6988 (0.6988)  time: 0.2478  data: 0.0052  max mem: 2057\n",
            "2021/05/21 00:31:18\tINFO\ttorchdistill.misc.log\tEpoch: [0] Total time: 0:00:03\n",
            "2021/05/21 00:31:18\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:31:18\tINFO\t__main__\tValidation: accuracy = 0.5633802816901409\n",
            "2021/05/21 00:31:18\tINFO\t__main__\tUpdating ckpt\n",
            "Configuration saved in ./resource/ckpt/glue/wnli/ce/wnli-bert-base-uncased/config.json\n",
            "Model weights saved in ./resource/ckpt/glue/wnli/ce/wnli-bert-base-uncased/pytorch_model.bin\n",
            "2021/05/21 00:31:19\tINFO\ttorchdistill.misc.log\tEpoch: [1]  [ 0/20]  eta: 0:00:04  lr: 1.58e-05  sample/s: 20.095289998718382  loss: 0.6915 (0.6915)  time: 0.2046  data: 0.0055  max mem: 4058\n",
            "2021/05/21 00:31:23\tINFO\ttorchdistill.misc.log\tEpoch: [1] Total time: 0:00:03\n",
            "2021/05/21 00:31:23\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:31:23\tINFO\t__main__\tValidation: accuracy = 0.5352112676056338\n",
            "2021/05/21 00:31:23\tINFO\ttorchdistill.misc.log\tEpoch: [2]  [ 0/20]  eta: 0:00:03  lr: 1.18e-05  sample/s: 22.6092492295004  loss: 0.6930 (0.6930)  time: 0.1817  data: 0.0048  max mem: 4060\n",
            "2021/05/21 00:31:27\tINFO\ttorchdistill.misc.log\tEpoch: [2] Total time: 0:00:03\n",
            "2021/05/21 00:31:27\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:31:27\tINFO\t__main__\tValidation: accuracy = 0.4225352112676056\n",
            "2021/05/21 00:31:27\tINFO\ttorchdistill.misc.log\tEpoch: [3]  [ 0/20]  eta: 0:00:03  lr: 7.800000000000002e-06  sample/s: 21.7976226514621  loss: 0.6963 (0.6963)  time: 0.1874  data: 0.0038  max mem: 4060\n",
            "2021/05/21 00:31:31\tINFO\ttorchdistill.misc.log\tEpoch: [3] Total time: 0:00:03\n",
            "2021/05/21 00:31:31\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:31:31\tINFO\t__main__\tValidation: accuracy = 0.4788732394366197\n",
            "2021/05/21 00:31:31\tINFO\ttorchdistill.misc.log\tEpoch: [4]  [ 0/20]  eta: 0:00:04  lr: 3.8000000000000005e-06  sample/s: 16.863777771300082  loss: 0.6933 (0.6933)  time: 0.2413  data: 0.0041  max mem: 4060\n",
            "2021/05/21 00:31:35\tINFO\ttorchdistill.misc.log\tEpoch: [4] Total time: 0:00:03\n",
            "2021/05/21 00:31:35\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:31:35\tINFO\t__main__\tValidation: accuracy = 0.4507042253521127\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"finetuning_task\": \"wnli\",\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e\n",
            "Model config BertConfig {\n",
            "  \"architectures\": [\n",
            "    \"BertForMaskedLM\"\n",
            "  ],\n",
            "  \"attention_probs_dropout_prob\": 0.1,\n",
            "  \"gradient_checkpointing\": false,\n",
            "  \"hidden_act\": \"gelu\",\n",
            "  \"hidden_dropout_prob\": 0.1,\n",
            "  \"hidden_size\": 768,\n",
            "  \"initializer_range\": 0.02,\n",
            "  \"intermediate_size\": 3072,\n",
            "  \"layer_norm_eps\": 1e-12,\n",
            "  \"max_position_embeddings\": 512,\n",
            "  \"model_type\": \"bert\",\n",
            "  \"num_attention_heads\": 12,\n",
            "  \"num_hidden_layers\": 12,\n",
            "  \"pad_token_id\": 0,\n",
            "  \"position_embedding_type\": \"absolute\",\n",
            "  \"transformers_version\": \"4.6.1\",\n",
            "  \"type_vocab_size\": 2,\n",
            "  \"use_cache\": true,\n",
            "  \"vocab_size\": 30522\n",
            "}\n",
            "\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None\n",
            "loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79\n",
            "loading weights file ./resource/ckpt/glue/wnli/ce/wnli-bert-base-uncased/pytorch_model.bin\n",
            "All model checkpoint weights were used when initializing BertForSequenceClassification.\n",
            "\n",
            "All the weights of BertForSequenceClassification were initialized from the model checkpoint at ./resource/ckpt/glue/wnli/ce/wnli-bert-base-uncased.\n",
            "If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForSequenceClassification for predictions without further training.\n",
            "2021/05/21 00:31:38\tINFO\t__main__\t[Student: bert-base-uncased]\n",
            "2021/05/21 00:31:38\tINFO\t/usr/local/lib/python3.7/dist-packages/datasets/metric.py\tRemoving /root/.cache/huggingface/metrics/glue/wnli/default_experiment-1-0.arrow\n",
            "2021/05/21 00:31:38\tINFO\t__main__\tTest: accuracy = 0.5633802816901409\n",
            "2021/05/21 00:31:38\tINFO\t__main__\tStart prediction for private dataset(s)\n",
            "2021/05/21 00:31:38\tINFO\t__main__\twnli/test: 146 samples\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "K18Cynocl-GN"
      },
      "source": [
        "# 5. Validate your prediction files for GLUE leaderboard\n",
        "To make sure your prediction files contain the right numbers of samples (lines), you should see the following output by `wc -l <your prediction dir path>`.\n",
        "\n",
        "```\n",
        "   1105 AX.tsv\n",
        "   1064 CoLA.tsv\n",
        "   9848 MNLI-mm.tsv\n",
        "   9797 MNLI-m.tsv\n",
        "   1726 MRPC.tsv\n",
        "   5464 QNLI.tsv\n",
        " 390966 QQP.tsv\n",
        "   3001 RTE.tsv\n",
        "   1822 SST-2.tsv\n",
        "   1380 STS-B.tsv\n",
        "    147 WNLI.tsv\n",
        " 426320 total\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0gynS2fvnVl4",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "4c3249e6-dd7f-4660-f614-da9333ec372b"
      },
      "source": [
        "!wc -l leaderboard/glue/standard/bert_base_uncased/*"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "   1105 leaderboard/glue/standard/bert_base_uncased/AX.tsv\n",
            "   1064 leaderboard/glue/standard/bert_base_uncased/CoLA.tsv\n",
            "   9848 leaderboard/glue/standard/bert_base_uncased/MNLI-mm.tsv\n",
            "   9797 leaderboard/glue/standard/bert_base_uncased/MNLI-m.tsv\n",
            "   1726 leaderboard/glue/standard/bert_base_uncased/MRPC.tsv\n",
            "   5464 leaderboard/glue/standard/bert_base_uncased/QNLI.tsv\n",
            " 390966 leaderboard/glue/standard/bert_base_uncased/QQP.tsv\n",
            "   3001 leaderboard/glue/standard/bert_base_uncased/RTE.tsv\n",
            "   1822 leaderboard/glue/standard/bert_base_uncased/SST-2.tsv\n",
            "   1380 leaderboard/glue/standard/bert_base_uncased/STS-B.tsv\n",
            "    147 leaderboard/glue/standard/bert_base_uncased/WNLI.tsv\n",
            " 426320 total\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cxWY9Ts-XEX9"
      },
      "source": [
        "## 6. Zip the submission files and download to make a submission"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "FQSr6ibGWV72",
        "outputId": "018e7db5-447b-4857-d4c3-32b4c0112c1f"
      },
      "source": [
        "!zip bert_base_uncased-submission.zip leaderboard/glue/standard/bert_base_uncased/*"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "  adding: leaderboard/glue/standard/bert_base_uncased/AX.tsv (deflated 82%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/CoLA.tsv (deflated 64%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/MNLI-mm.tsv (deflated 83%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/MNLI-m.tsv (deflated 83%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/MRPC.tsv (deflated 64%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/QNLI.tsv (deflated 85%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/QQP.tsv (deflated 73%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/RTE.tsv (deflated 84%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/SST-2.tsv (deflated 64%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/STS-B.tsv (deflated 56%)\n",
            "  adding: leaderboard/glue/standard/bert_base_uncased/WNLI.tsv (deflated 62%)\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cc66ysgrWv12"
      },
      "source": [
        "Download the zip file from \"Files\" menu.  \n",
        "To submit the file to the GLUE system, refer to their webpage.\n",
        "https://gluebenchmark.com/"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AxE8LY2E3Z78"
      },
      "source": [
        "## 7. More sample configurations, models, datasets...\n",
        "You can find more [sample configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/sample/) in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository.  \n",
        "If you would like to use larger datasets e.g., **ImageNet** and **COCO** datasets and models in `torchvision` (or your own modules), refer to the [official configurations](https://github.com/yoshitomo-matsubara/torchdistill/tree/master/configs/official) used in some published papers.  \n",
        "Experiments with such large datasets and models will require you to use your own machine due to limited disk space and session time (12 hours for free version and 24 hours for Colab Pro) on Google Colab.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0BEXt2243OE9"
      },
      "source": [
        "# Colab examples for knowledge distillation\n",
        "You can find Colab examples for knowledge distillation experiments in the [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) repository."
      ]
    }
  ]
}